Natural Language Processing for Improving Textual Accessibility ( NLP 4 ITA ) Workshop Programme

نویسندگان

  • María Jesús Aranzabe
  • Arantza Díaz de Ilarraza
  • Itziar Gonzalez-Dios
چکیده

Analysis of long sentences are source of problems in advanced applications such as machine translation. With the aim of solving these problems in advanced applications, we have analysed long sentences of two corpora written in Standard Basque in order to make syntactic simplification. The result of this analysis leads us to design a proposal to produce shorter sentences out of long ones. In order to perform this task we present an architecture for a text simplification system based on previously developed general coverage tools (giving them a new utility) and on hand written rules specific for syntactic simplification. Being Basque an agglutinative language this rules are based on morphological features. In this work we focused on specific phenomena like appositions, finite relative clauses and finite temporal clauses. The simplification proposed does not exclude any target audience, and the simplification could be used for both humans and machines. This is the first proposal for Automatic Text simplification and opens a research line for the Basque language in NLP.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NLP 4 ITA 2013 Proceedings of the Second Workshop of Natural Language Processing for Improving Textual

Persons affected by Autism Spectrum Disorders (ASD) present impairments in social interaction. A significant percentile of them have inadequate reading comprehension skills. In the ongoing FIRST project we build a multilingual tool called Open Book that helps the ASD people to better understand the texts. The tool applies a series of automatic transformations to user documents to identify and r...

متن کامل

Nlp 4 Ita 2013

This paper discusses user study outcomes with teachers who used Language Muse SM a webbased teacher professional development (TPD) application designed to enhance teachers’ linguistic awareness, and support teachers in the development of language-based instructional scaffolding (support) for their English language learners (ELL). System development was grounded in literature that supports the n...

متن کامل

Entailment Analysis for Improving Chinese Recognizing Textual Entailment System

Recognizing Textual Entailment (RTE) is a new research issue in natural language processing (NLP) research area. RTE can be a useful component in many NLP applications. In this paper, we introduce our finding on the entailment analysis of the NTCIR-10 RITE-2 dataset, and use the observation to improve our system. In the previous works, all the input pairs are treated equally in a standard class...

متن کامل

Topic Modeling for the Social Sciences

As textual datasets grow in size and scope, social scientists need better tools to help make sense of that data. Despite the natural applicability of topic modeling to many such problems, word counts and tag clouds are often used as the primary means of gleaning information from textual data. We characterize two barriers to adoption encountered during a collaboration between the Stanford NLP gr...

متن کامل

A Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew

This paper presents a comprehensive NLP system by Melingo that has been recently developed for Arabic, based on Morfix – an operational formerly developed highly successful comprehensive Hebrew NLP system. The system discussed includes modules for morphological analysis, context sensitive lemmatization, vocalization, text-to-phoneme conversion, and syntactic-analysis-based prosody (intonation) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012